Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting
نویسندگان
چکیده
منابع مشابه
A Natural Policy Gradient
We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be...
متن کاملExperiments with Infinite-Horizon, Policy-Gradient Estimation
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs. The algorithm’s chief advantages are that it uses only one free param...
متن کاملInfinite-Horizon Policy-Gradient Estimation
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward ...
متن کاملAnalysis and Improvement of Policy Gradient Estimation
Policy gradient is a useful model-free reinforcement learning approach, but it tends to suffer from instability of gradient estimates. In this paper, we analyze and improve the stability of policy gradient methods. We first prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than that of the classical REINFORCE method u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Transactions on Information and Systems
سال: 2018
ISSN: 0916-8532,1745-1361
DOI: 10.1587/transinf.2017edp7363